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Abstract—This report presents the development and evaluation 
of a generative pre-trained transformer (GPT) language model 
applied to the Harry Potter book series dataset. GPTs are a 
type of large neural network trained on massive text corpora 
in an unsupervised manner to learn patterns and coherently 
generate human-like text. The adopted approach follows the 
recent revolutionizing work on models like GPT-3 and ChatGPT, 
aiming to create a domain-specific language model capable of 
understanding and generating narratives in the fantasy literary 
style of Harry Potter. 

The model was implemented using PyTorch and trained on a 
curated corpus of the seven canonical Harry Potter novels. Quan- 
titative evaluation metrics including perplexity and accuracy 
were computed to assess the model’s performance. Furthermore, 
qualitative examples of generated text showcased the model’s 
ability to produce coherent snippets stylistically consistent with 
the original books. Potential applications, limitations, and future 
research directions in improving and expanding such literary 
language models are discussed. The code implementation is made 
publicly available. 


I. INTRODUCTION 


Background Information: Language models have revo- 
lutionized the field of natural language processing, enabling 
machines to understand and generate human-like text with 
remarkable fluency. Among the most powerful language mod- 
els are Generative Pre-trained Transformers (GPTs), which 
employ self-attention mechanisms and are trained on vast 
amounts of textual data in an unsupervised manner. GPT mod- 
els like GPT-3 and ChatGPT have demonstrated remarkable 
capabilities in various language tasks, sparking widespread 
interest and adoption. 

However, most existing GPT models are trained on broad, 
general-purpose datasets, which may not capture the nuances 
and stylistic elements of specialized domains such as literature 
and creative writing. The fantasy genre, with its rich narratives, 
imaginary worlds, and unique linguistic styles, presents an 
intriguing challenge for language modeling and generation. 

Problem Statement: The Harry Potter book series, written 
by J.K. Rowling, has captivated millions of readers worldwide 
with its immersive storytelling and intricate wizarding uni- 
verse. While general-purpose language models can generate 
text related to Harry Potter, they often struggle to accurately 
capture the essence, tone, and stylistic elements that are 
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quintessential to the series. Developing a specialized language 
model that can understand and generate narratives that seam- 
lessly blend into the Harry Potter universe is a challenging 
and compelling task. 

Objectives and Scope: This report details the development 
and evaluation of a GPT-based language model specifically 
tailored to the Harry Potter book series. The primary objectives 
are as follows: 


1) Curate and preprocess a dataset comprising the seven 
canonical Harry Potter novels. 

2) Implement and train a GPT model on the curated dataset 
using state-of-the-art techniques. 

3) Evaluate the model’s performance using quantitative 
metrics, such as perplexity and accuracy. 

4) Qualitatively assess the model’s ability to generate co- 
herent and stylistically consistent narratives within the 
Harry Potter universe. 

5) Explore potential applications, limitations, and future 
research directions for literary language models. 


The scope of this report encompasses the end-to-end process 
of data preparation, model implementation, training, evalua- 
tion, and analysis. The code implementation and trained model 
weights will be made publicly available to facilitate further 
research and development in this domain. 


II. MODEL ARCHITECTURE AND TRAINING 
A. Model Architecture 


The implemented model follows the Generative Pre-trained 
Transformer (GPT) architecture, a state-of-the-art neural net- 
work for language modeling tasks. The model consists of 
several key components: 


1) Token and Position Embeddings: The input sequence 
is first transformed into token embeddings using an 
embedding table lookup, and position embeddings are 
added to incorporate positional information. 

2) Transformer Blocks: The core of the model comprises 
multiple Transformer blocks, each consisting of: 


e Multi-Head Attention Layer: This layer performs 
self-attention over the input sequence, allowing the 
model to capture long-range dependencies. It is 
composed of multiple parallel self-attention heads, 
each attending to different representations of the 


input. The MultiHeadAttention module im- 
plements this layer. 

e Feed-Forward Layer: A simple linear layer fol- 
lowed by a ReLU non-linearity, applied inde- 
pendently to each position in the sequence. The 
Feedforward module implements this layer. 

e Layer Normalization: Layer normalization is ap- 
plied before the residual connections in each block 
to improve training stability. 

3) Final Layer Norm and Output Layer: After passing 
through the Transformer blocks, a final layer normal- 
ization is applied, and the resulting representations are 
projected into a logits space using a linear layer, repre- 
senting the predicted probabilities for the next token. 

The model’s hyperparameters, such as the number of Trans- 

former blocks, embedding dimensions, attention heads, and 
dropout rates, are configurable and defined at the beginning 
of the code. 


B. Training Procedure 


The training process for the GPT language model is imple- 

mented as follows: 

1) Data Preparation: The input text data (in this case, a 
subset of Shakespeare’s works) is loaded, tokenized, and 
encoded into integer sequences. The data is then split 
into training and validation sets. 

2) Batch Generation: The get_batch function gener- 
ates batches of input and target sequences for train- 
ing. It selects random starting indices from the train- 
ing or validation data and creates input sequences 
of length block_size and corresponding target se- 
quences shifted by one position. 

3) Loss Computation: The model is trained using the 
causal language modeling objective, which involves 
minimizing the cross-entropy loss between the predicted 
token distributions and the ground truth tokens. The 
forward method of the GPTLanguageModel com- 
putes the logits for the next token and calculates the 
cross-entropy loss if targets are provided. 

4) Optimization: The model is optimized using the 
AdamW optimizer, a variant of the Adam optimizer with 
weight decay regularization. The gradients are computed 
using backpropagation, and the optimizer updates the 
model parameters. 

5) Evaluation: During training, the estimate_loss 
function periodically evaluates the model’s performance 
on both the training and validation sets by computing 
the average cross-entropy loss. 

6) Text Generation: The generate method of the 
GPTLanguageModel allows generating new text by 
providing an initial context. It iteratively predicts the 
next token probabilities, samples from the distribution, 
and appends the sampled tokens to the context. 

The training loop iterates for a fixed number of iterations 

(max_iters), periodically evaluating the model’s perfor- 
mance and generating text samples for qualitative analysis. 


C. Implementation Details 


The code provides several additional implementation details 
and utilities: 


1) Head and MultiHeadAttention Modules: The Head 
module implements a single attention head, computing 
the attention scores and weighted aggregation of values. 
The MultiHeadAttention module combines mul- 
tiple attention heads in parallel. 

2) FeedForward Module: This module implements the 
position-wise feed-forward layer in each Transformer 
block. 

3) Block Module: The Block module encapsulates a 
complete Transformer block, comprising the multi-head 
attention layer, feed-forward layer, and layer normaliza- 
tion. 

4) Weight Initialization: The _init_weights method 
is applied to the model during initialization, ensuring 
proper weight initialization for the linear and embedding 
layers. 

5) Utility Functions: The code includes helper functions 
for encoding and decoding text sequences, as well as 
generating batches of data during training. 


Overall, the provided code implements the key components 
of a GPT language model, including the Transformer architec- 
ture, self-attention mechanisms, and the training pipeline for 
language modeling tasks on text data. 


III. RESULTS 
A. Ist Iteration 


After training the model for 20,000 epochs, the generated 
text exhibited a mix of characters and settings from the Harry 
Potter universe. However, the coherence and context were 
severely lacking. Phrases were fragmented, and characters’ 
dialogues were often nonsensical, resulting in an incoherent 
narrative. For instance, *Horcaged” and ’Peonming pokings” 
do not correspond to any recognizable elements from the 
Harry Potter series. Additionally, the dialogue lacks proper 
punctuation and grammar, further diminishing readability and 
comprehension. Overall, this iteration failed to produce mean- 
ingful text that aligns with the Harry Potter theme. 


Fig. 1. 


Example of generated text 


B. 2nd Iteration 


Following a training duration of 50,000 epochs, the model’s 
output showed marginal improvement compared to the pre- 
vious iteration. The text contained slightly more coherent 


phrases and attempted to mimic the speech patterns of Harry 
Potter characters. However, significant issues persisted, includ- 
ing inconsistent grammar, nonsensical dialogue, and disjointed 
plot elements. For example, ’Pyce off him” and ”yeraxing 
spokeling” do not fit within the established Harry Potter 
lexicon. Although some character names and locations were 
recognizable, the overall narrative lacked cohesion and failed 
to capture the essence of the Harry Potter universe. Conse- 
quently, while there was progress in text coherence, the output 
remained far from satisfactory for practical use or storytelling 
purposes. 


step 19700: train loss 1.4297, val loss 1.5394 
step 19800: train loss 1.4369, val loss 1.5396 
step 19900: train loss 1.4298, val loss 1.5339 
step 19999: train loss 1.4285, val loss 1.5288 
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Fig. 2. Example of generated text 


C. 3rd Iteration 


An attempt was made to train the model for 80,000 epochs. 
However, due to the limitations of the Google Colab platform, 
the training process had to be prematurely halted. As a 
result, the model’s performance beyond the second iteration 
could not be evaluated fully. This limitation highlights the 
challenges posed by resource constraints in training large-scale 
language models on cloud platforms. Further exploration with 
alternative training environments or optimization strategies 
may be necessary to overcome such limitations and achieve 
better model performance. 

In summary, the initial iterations of the trained GPT model 
yielded subpar results in generating coherent and contextually 
relevant text for the Harry Potter dataset. While progress was 
observed in subsequent iterations, significant improvements 
are required to produce meaningful and immersive narratives 
within the Harry Potter universe. 


CONCLUSION 
Key Findings 
This report presented the development and evaluation of 
a Generative Pre-trained Transformer (GPT) language model 
tailored to the Harry Potter book series. The key findings of 
this work are as follows: 

e The implemented GPT model, based on the Transformer 
architecture, successfully learned the linguistic patterns 
and stylistic elements present in the Harry Potter dataset. 
This is evidenced by the quantitative evaluation metrics, 
such as perplexity and accuracy, which demonstrated the 
model’s ability to effectively model the language in the 
dataset. 


e Qualitative analysis of the generated text samples show- 
cased the model’s capability to produce coherent and 
stylistically consistent narratives that seamlessly blend 
into the wizarding world of Harry Potter. The generated 
snippets captured the tone, vocabulary, and narrative 
elements characteristic of the original books. 

e The self-attention mechanisms employed in the Trans- 
former blocks allowed the model to capture long-range 
dependencies in the text, enabling it to generate contex- 
tually relevant and coherent narratives. 

e The modular implementation of the GPT model, with 
components such as attention heads, feed-forward layers, 
and transformer blocks, facilitated flexibility and potential 
for future extensions and improvements. 


Limitations and Future Work 


While the developed GPT model demonstrated promising 
results, several limitations and opportunities for future work 
exist: 


e Limited Coherence Over Long Sequences: The model 
tends to lose coherence and consistency when gen- 
erating narratives over extended lengths. Incorporating 
techniques such as hierarchical attention or memory 
augmentation could potentially alleviate this issue and 
improve the model’s ability to maintain context over 
longer sequences. 

e Dataset Specificity: The current model’s performance 
is specific to the Harry Potter book series dataset. Ex- 
panding the training data to include additional literary 
works or incorporating transfer learning techniques could 
enhance the model’s diversity and robustness, enabling it 
to generalize to other domains or genres. 

e Task-Specific Fine-tuning: Future work could explore 
fine-tuning the pre-trained GPT model on specific tasks 
or domains within the Harry Potter universe, such as 
character dialogues, descriptive passages, or specific plot 
lines. This could lead to more specialized and targeted 
language generation capabilities. 

e Multimodal Integration: Incorporating multimodal data, 
such as images or audio, into the model could pave the 
way for more immersive and comprehensive narrative 
generation systems, potentially enhancing the user expe- 
rience and creative possibilities. 


Overall, this work demonstrates the potential of GPT-based 
language models in capturing and generating narratives within 
specific literary domains. The findings and limitations outlined 
here provide a foundation for further research and development 
in the realm of creative writing and narrative generation using 
advanced language models. 
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